Spanish Keyword Spotting System Based on Filler Models, Pseudo N-gram Language Model and a Confidence Measure

نویسندگان

  • Javier Tejedor
  • José Colás
چکیده

In order to organize efficiently lots of hours of audio contents such as meetings, radio news, search for spoken keywords is essential. An approach uses filler models to account for non-keyword intervals. Another approach uses a large vocabulary continuous speech recognition system (LVCSR) which retrieves a word string and then search for the keywords in this string. This approach yields high performance but it requires a lot of training data and costly computation. In this paper we present several filler models and a confidence measure explored in a Spanish keyword spotting system. We will also investigate different weights in the grammar used for the language modelling in the keyword spotting system in order to achieve the best results. The keyword technique used is based on Hidden Markov Model (HMM). Test results are reported on a set of data from the geographic corpus of Albayzin speech data base containing 80 keywords taken from the words which most times occurs in the corpus sentences.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Lexical Access-based Confidence Measure for a Spanish Keyword Spotting System

Keyword spotting deals with the search of a reduced set of keywords in audio content. Phone Lattice-based approaches are very fast but achieve poor results. HMM-based keyword spotting systems deal with filler models to absorb the Out-of-vocabulary (OOV) words and achieve best results although they are slower. We propose a technique which combines them in order to perform a confidence measure to...

متن کامل

Keyword spotting for highly inflectional languages

This paper presents our new keyword spotting system taking advantage of both the filler model and the confidence measure approaches. The novelty is in a non-standard connection of the filler and the keyword models together with introduction of a new confidence measure based on a keyword normalized score. In detail the paper deals with a decision block. Two methods are introduced. The first is b...

متن کامل

Using phonological phrase segmentation to improve automatic keyword spotting for the highly agglutinating Hungarian language

This paper investigates the usage of prosody for the improvement of keyword spotting, focusing on the highly agglutinating Hungarian language, where keyword spotting cannot be effectively performed using LVCSR, as such systems are either unavailable or hard to operate due to high OOV rates and poor Ngram language modelling capabilities. Therefore, the applied keyword spotting system is based on...

متن کامل

Improving performance of a keyword spotting system by using a new confidence measure

This work describes a HMM-based keyword spotting system. In this system, keywords are modeled as concatenations of the corresponding phoneme models, consequently, no specific databases are needed to train the system. In addition no filler models are required, therefore small computational requirements are necessary. Two main stages define the whole system. The first stage is based on a previous...

متن کامل

Out-of-Vocabulary Word Modeling and Rejection for Spanish Keyword Spotting Systems

This paper presents a combination of out-of-vocabulary (OOV) word modeling and rejection techniques in an attempt to accept utterances embedding a keyword and reject utterances with nonkeywords. The goal of this research is to develop a robust, task-independent Spanish keyword spotter and to develop a method for optimizing confidence thresholds for a particular context. To model OOV words, we e...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006